Agenda

  • Introduction to time series analysis and forecasting
  • Time series objects - introduction to the time series classes and their attributes
  • Descriptive analysis of time series
  • Linear regression-based forecasting models
  • The ARIMA family of models

We will mainly focus today on methods for analyzing and forecasting regular time-series data with seasonality patterns

Assumptions

  • Some background in R
  • Basic knowledge in probability
  • Familiar with linear regression

Why R?

  • Statistical programming language
  • A vast amount of packages for time series analysis
  • The forecast package (and soon the fable package)

Goals

By the end of this workshop, you probably won’t become an expert in time series analysis and forecasting, but you will be able to:

  • Explore time series data with some basic tools
  • Use discriptive statistics for identifying seasonal and correlation patterns
  • Build basic forecasting model

Admin

Workshop material

All today’s slides, code, and rmarkdown files are available on GitHub

Downloading the workshop material from the terminal:

git clone https://github.com/RamiKrispin/Time-Series-Workshop.git

Or lunch it from a docker container:

Introduction to time series analysis

Time series analysis is commonly used in many fields of science, such as economics, finance, physics, engineering, and astronomy. The usage of time series analysis to understand past events and to predict future ones did not start with the introduction of the stochastic process during the past century. Ancient civilizations such as the Greeks, Romans, or Mayans, researched and learned how to utilize cycled events such as weather and astronomy to predict future events.

Time series analysis - is the art of extracting meaningful insights from time-series data to learn about past events and to predict future events.

This process includes the following steps:

  • Data collection - pulling the raw data from a database, API, flat files etc.
  • Data prep - cleaning, reformating (dates, classes, etc.), aggregating
  • Descriptive analysis - using statistical methods and data visualization tools to extract insights and learn about the series components and patterns
  • Predictive analysis - leveraging the insights learned in the descriptive process and apply some predictive model

Generally, in R this process will look like this:

Of course, there are more great packages that could be part of this process such as zoo, xts, bsts, forecastHybird, TSstudio, etc.

Time series data

Time series data - is a sequence of values, each associate to a unique point in time that can divide to the following two groups:

  • Regular time series - is a sequence of observations which were captured at equally spaced time intervals (e.g., every month, week, day, hour, etc.)
  • Irregular time series - or unevenly spaced time series, is a sequence of observations which were not captured on equally spaced time intervals (for example rainy days, earthquakes, clinical trials, etc.)

Note: typically, the term time series data referred to regular time-series data. Therefore, if not stated otherwise, throughout the workshop the term time series (or series) refer to regular time-series data

Examples of time series data

Applications

With time series analysis, you can answer questions such as:

  • How many vehicles, approximately, going to be sold in the US in the next 12 months?
  • What will be the estimated demand for natural gas in the US in the next five years?
  • Generally, what will be the demand for electricity in the UK during the next 24 hours?

Time series objects

There are multiple classes in R for time-series data, the most common types are:

  • The ts class for regular time-series data, and mts class for multiple time seires objects , the most common class for time series data
  • The xts and zoo classes for both regular and irregular time series data, mainly popular in the financial field
  • The tsibble class, a tidy format for time series data, support both regular and irregular time-series data

The attribute of time series object

A typical time series object should have the following attributes:

  • A vector or matrix objects with sequential observations
  • Index or timestamp
  • Frequency units
  • Cycle units

Where the frequency of the series represents the units of the cycle. For example, for monthly series, the frequency units are the month of the year, and the cycle units are the years. Similarly, for daily series, the frequency units could be the day of the year, and the cycle units are also the years.

The stats package provides a set of functions for handling and extracting information from a ts object. The frequency and cycle functions, as their names implay return the frequency and the cycle, respectivly, of the object. Let’s load the USgas series from the TSstudio package and apply those functions:

library(TSstudio)
data(USgas)

class(USgas)
## [1] "ts"
is.ts(USgas)
## [1] TRUE
frequency(USgas)
## [1] 12
cycle(USgas)
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2000   1   2   3   4   5   6   7   8   9  10  11  12
## 2001   1   2   3   4   5   6   7   8   9  10  11  12
## 2002   1   2   3   4   5   6   7   8   9  10  11  12
## 2003   1   2   3   4   5   6   7   8   9  10  11  12
## 2004   1   2   3   4   5   6   7   8   9  10  11  12
## 2005   1   2   3   4   5   6   7   8   9  10  11  12
## 2006   1   2   3   4   5   6   7   8   9  10  11  12
## 2007   1   2   3   4   5   6   7   8   9  10  11  12
## 2008   1   2   3   4   5   6   7   8   9  10  11  12
## 2009   1   2   3   4   5   6   7   8   9  10  11  12
## 2010   1   2   3   4   5   6   7   8   9  10  11  12
## 2011   1   2   3   4   5   6   7   8   9  10  11  12
## 2012   1   2   3   4   5   6   7   8   9  10  11  12
## 2013   1   2   3   4   5   6   7   8   9  10  11  12
## 2014   1   2   3   4   5   6   7   8   9  10  11  12
## 2015   1   2   3   4   5   6   7   8   9  10  11  12
## 2016   1   2   3   4   5   6   7   8   9  10  11  12
## 2017   1   2   3   4   5   6   7   8   9  10  11  12
## 2018   1   2   3   4   5   6   7   8   9  10  11  12
## 2019   1   2   3   4   5   6   7

The time function returns the series index or timestamp:

head(time(USgas))
## [1] 2000.0000 2000.0833 2000.1667 2000.2500 2000.3333 2000.4167

The deltat function returns the length of series’ time interval (which is equivalent to 1/frequency):

deltat(USgas)
## [1] 0.083333333

Similarly, the start and end functions return the starting and ending time of the series, respectively:

start(USgas)
## [1] 2000    1
end(USgas)
## [1] 2019    7

Where the left number represents the cycle units, and the right side represents the frequency units of the series. The tsp function returns both the start and end of the series and its frequency:

tsp(USgas)
## [1] 2000.0 2019.5   12.0

Last but not least, the ts_info function from the TSstudio package returns a concise summary of the series:

ts_info(USgas)
##  The USgas series is a ts object with 1 variable and 235 observations
##  Frequency: 12 
##  Start time: 2000 1 
##  End time: 2019 7

Creating a ts object

The ts function allows to create a ts object from a single vector and a mts object from a multiple vectors (or matrix). By defining the start (or end) and frequency of the series, the function generate the object index. In the following example we will load the US_indicators dataset from the TSstudio package and convert it to a ts object. The US_indicators is a data.frame with the monthly vehicle sales and unemployment rate in the US since 1976:

data(US_indicators)

head(US_indicators)
##         Date Vehicle Sales Unemployment Rate
## 1 1976-01-31         885.2               8.8
## 2 1976-02-29         994.7               8.7
## 3 1976-03-31        1243.6               8.1
## 4 1976-04-30        1191.2               7.4
## 5 1976-05-31        1203.2               6.8
## 6 1976-06-30        1254.7               8.0
mts_obj <- ts(data = US_indicators[, c("Vehicle Sales", "Unemployment Rate")], 
              start = c(1976, 1),
              frequency = 12)

ts_info(mts_obj)
##  The mts_obj series is a mts object with 2 variables and 524 observations
##  Frequency: 12 
##  Start time: 1976 1 
##  End time: 2019 8

How to define the start point of series?

DT::datatable(data.frame(frequency = c(4, 12, 52, 365),
                         cycle_units = c("Year", "Year", "Year", "Year") ,
                         frequency_units = c("Quarter", "Month", "Week", "Day of the year")))